Assignment - VAST Challenge MC2 - Part 2

This post presents the step by step instructions that were followed to identify which GASTech employess made which purchases and identify suspicious patterns of behavior using visual analytic techniques.

Mayurapriyann Arulmozhi Baskaran https://www.linkedin.com/in/mayurapriyann/
07-25-2021
packages = c('raster','sf','tmap','tidyverse','clock','rgdal','tidytext','widyr',
             'DT','dplyr','hms','ggraph','igraph','crosstalk',
             'plotly','data.table','stringi','mapview','ggridges','networkD3',
             'htmlwidgets')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Task 2

Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find? Please limit your answer to 8 images and 500 words.

Importing raster file

bgmap <- raster("data/MC2-tourist.tif")
bgmap
class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1595, 2706, 4316070  (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05  (x, y)
extent     : 24.82419, 24.90976, 36.04499, 36.09543  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : MC2-tourist.tif 
names      : MC2.tourist 
values     : 0, 255  (min, max)

Plotting Raster Layer

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255)

The map of Abila gives us a clear picture of the distance that GASTech is away from the other locations visited by the employees. From the map of Abila, the most frequented locations such as Katerina’s Cafe, Guy’s Gyros, Brew’ve Been Served were all seems to be in close proximity with GASTech. Hippokampos was the second most frequented location but surprisingly it was not present in the map. Since the top 4 frequented locations are closely located, I assume Hippokampos also should be near to GASTech.

Importing vector GIS data file

Abila_st <- st_read(dsn = "data/Geospatial",
                    layer = "Abila")
Reading layer `Abila' from data source 
  `C:\abmayur05\VisualAnalytics\_posts\2021-07-26-assignment-vast-challenge-mc2-part-2\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

Importing Aspatial Data

gps <- read_csv("data/gps.csv")
glimpse(gps)
Rows: 685,169
Columns: 4
$ Timestamp <chr> "01/06/2014 06:28:01", "01/06/2014 06:28:01", "01/~
$ id        <dbl> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ lat       <dbl> 36.07623, 36.07622, 36.07621, 36.07622, 36.07621, ~
$ long      <dbl> 24.87469, 24.87460, 24.87444, 24.87425, 24.87417, ~

Converting Date-Time field and ID field

gps$Timestamp <- date_time_parse(gps$Timestamp,
                                 zone = "",
                                 format = "%m/%d/%Y %H:%M:%S")
gps$id <- as_factor(gps$id)

Converting Aspatial Data Into A Simple Feature data Frame

gps_sf <- st_as_sf(gps,
                   coords = c("long","lat"),
                   crs = 4326)
gps_sf <- gps_sf %>%
  mutate(day = get_day(Timestamp),
         hour = get_hour(Timestamp),
         dayofweek = date_weekday_factor(Timestamp) ,
         date = as_date(Timestamp))
glimpse(gps_sf)
Rows: 685,169
Columns: 7
$ Timestamp <dttm> 2014-01-06 06:28:01, 2014-01-06 06:28:01, 2014-01~
$ id        <fct> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ geometry  <POINT [°]> POINT (24.87469 36.07623), POINT (24.8746 36~
$ day       <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ hour      <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ dayofweek <ord> Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, ~
$ date      <date> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, 2~

Creating Movement Path From GPS Points

The code chunk below joins the GPS points into movement paths by using car id, date and hour as unique identifiers.

gps_path <- gps_sf %>%
  group_by(id, date, hour) %>%
  summarize(timestamp= mean(Timestamp),
            do_union=FALSE) %>%
  st_cast("LINESTRING")

Finding the orphan lines

After getting the movement paths we do see some orphan GPS points hence finding those orphan lines.

p = npts(gps_path, by_feature = TRUE)
gps_path2 <- cbind(gps_path, p)

Removing the orphan lines

gps_path3 <- gps_path2[!(gps_path2$p==1),]

Plotting the GPS Paths

Below code chunk is used to overplot the GPS path of all the car ids onto the background tourist map.

gps_path_selected <- gps_path3
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col='red')

Previously using the heat map plot on location vs hour of transaction shows that there are some transactions recorded at early morning around 3 am - 4 am at Kronos Mart. Hence further plotting the GPS path during the time period to see if any car id had passed by that way.

gps_path_selected <- gps_path3 %>%
  filter(hour==3)
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col="red") 

It can be seen from the GPS data that no car had passed by Kronos mart or its near by locations between 3 am - 4 am. Hence the employee should have taken his personal vehicle or should have visited that particular location by a walk in order to hide his/her identity. The owners of credit card numbers ending with 8332, 9551 and 3484 are those who had made transaction on 19/01/2014 (Sunday) between 3 am - 4 am. Transaction gap between credit card numbers 8332 and 9551 were only three minutes. And all the three transactions were made without the use of loyalty cards.

Plotting the GPS path during the 10 pm - 11 pm to see if any car id had passed by that way.

gps_path_selected <- gps_path3 %>%
  filter(date > '2014-01-09', date < '2014-01-11', hour == 22)
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col="red") 

The last 4 digit of the credit card is 2463 for the transaction between 10pm-11pm on 10/01/2014 Friday at location Hippokampos. But surprisingly the particular location is not seen in the map.

Below GPS path had been plotted for car id 28.

gps_path_selected <- gps_path3 %>%
  filter(id == 28)
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col="red") 

By glazing through the routes visited by the employee car 28, there is an evident difference observed in the GPS path of car ID 28. Other car IDs have some regular pattern but the path taken by car ID 28 is highly disoriented and has many off-road impressions.

gps_path_selected <- gps_path3 %>%
  filter(date=='2014-01-13' & hour>=19 & hour<20)
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col="red") 

Car IDs - 13,15,16,34 GPS points impression can be seen around Frydos Autosupply on 13/01/2014 between 7pm-8pm as there was an unusal transaction. It will be further analysed at a later stage.